General and Domain-adaptive Chinese Spelling Check with Error-consistent Pretraining

نویسندگان

چکیده

The lack of label data is one the significant bottlenecks for Chinese Spelling Check. Existing researches use automatic generation method by exploiting unlabeled to expand supervised corpus. However, there a big gap between real input scenario and automatically generated Thus, we develop competitive general speller ECSpell, which adopts Error-consistent masking strategy create pretraining. This error-consistency used specify error types sentences consistent with scene. experimental result indicates that our model outperforms previous state-of-the-art models on benchmark. Moreover, spellers often work within particular domain in life. Due many uncommon terms, experiments built domain-specific datasets show perform terribly. Inspired common practice methods, propose add an alterable user dictionary handle zero-shot domain-adaption problem. Specifically, attach User Dictionary guided inference module (UD) token classification-based speller. Our demonstrate ECSpell UD , namely, combined UD, surpasses all other baselines broadly, even approaching performance 1

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

NLPTEA 2017 Shared Task - Chinese Spelling Check

This paper provides an overview along with our findings of the Chinese Spelling Check shared task at NLPTEA 2017. The goal of this task is to develop a computerassisted system to automatically diagnose typing errors in traditional Chinese sentences written by students. We defined six types of errors which belong to two categories. Given a sentence, the system should detect where the errors are,...

متن کامل

An Adaptive Error-Check Solution

Because of the traditional methods of video error checking are unfitted to the network environment in which the packet loss rate frequently changes, this paper proposes an adaptive error checking solution on the basis of estimation of the change trend of packet-loss rate. By setting the threshold value of packet-loss rate, the solution takes advantage of improved auto repeat request supported b...

متن کامل

A Maximum Entropy Approach to Chinese Spelling Check

Spelling check identifies incorrect writing words in documents. For the reason of input methods, Chinese spelling check is much different from English and it is still a challenging work. For the past decade years, most of the methods in detecting errors in documents are lexicon-based or probability-based, and much progress are made. In this paper, we propose a new method in Chinese spelling che...

متن کامل

Sinica-IASL Chinese spelling check system at Sighan-7

We developed a Chinese spelling check system for error detection and error correction subtasks in the 2013 SIGHAN-7 Chinese Spelling Check Bake-off. By using the resources of Chinese phonology and orthographic components, our system contains four parts: high confidence pattern matcher, the detection module, the correction module, and the merger. We submitted 2 official runs for both subtasks. T...

متن کامل

NTOU Chinese Spelling Check System in Sighan-8 Bake-off

This paper describes details of NTOU Chinese spelling check system in SIGHAN-8 Bakeoff. Besides the basic architecture of the previous system participating in last two CSC tasks, three new preference rules were proposed to deal with Simplified Chinese characters, variants, sentence-final particles, and DE-particles. A new sentence likelihood function was proposed based on frequencies of space-r...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: ACM Transactions on Asian and Low-Resource Language Information Processing

سال: 2023

ISSN: ['2375-4699', '2375-4702']

DOI: https://doi.org/10.1145/3564271